On the conditional logistic estimator 
for repeated binary outcomes 
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ro! . Abstract 

The behavior of the conditional logistic estimator is analyzed under a causal model for 

, two-arm experimental studies with possible non-compliance in which the effect of the 

00 ' 

' treatment is measured by a binary response variable. We show that, when non-compliance 

. may only be observed in the treatment arm, the effect (measured on the logit scale) of 

the treatment on compilers and that of the control on non-compliers can be identified and 
. consistently estimated under mild conditions. The same does not happen for the effect of 

the control on compilers. A simple correction of the conditional logistic estimator is then 
^ ■ proposed which allows us to considerably reduce its bias in estimating this quantity and 

H ; 

5^ I the causal effect of the treatment over control on compilers. A two-step estimator results 

whose asymptotic properties are studied by exploiting the general theory on maximum 
likelihood estimation of misspecified models. Finite-sample properties of the estimator are 
studied by simulation and the extension to the case of missing responses is outlined. The 
approach is illustrated by an application to a dataset deriving from a study on the efficacy 
of a training course on the practise of breast self examination. 

Key words: Causal inference; Counterfactuals; Potential outcomes; Pseudo-likelihood in- 
ference; Sufficient statistics. 
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1 Introduction 



Conditional logistic regression is a commonly used tool of data analysis in the health sciences 
and medical statistics when the outcome of interest is binary and subjects are observed before 
and after a certain treatment or these subjects are somehow matched; see, for instance, [1] , [2] , 
[3] and [4] . The main reasons of the popularity of the method are represented by its simplicity 
and by the possibility of obtaining reliable estimates of the quantities of interest under very 
mild conditions. 

The first aim of this paper is that of illustrating the behavior of the conditional logistic 
estimator when data come from two-arm experimental studies with all-or-nothing compliance 
in which the efficacy of the treatment is observed through a binary variable, and a pre-treatment 
version of the same variable is available. Wc arc then in a context of repeated binary outcomes 
at two occasions, before and after the treatment (or control), for which non-compliance may 
represent a strong source of confounding in estimating the causal effect of the treatment over 
control. An example is represented by the study described by [5] on the effect of a training 
course on the attitude to practise breast self examination (BSE); see also [6]. In this study, 
a significant number of women, among those randomly assigned to the treatment, did not 
comply and preferred learning BSE by a standard method (control). Moreover, the efficacy of 
the treatment is measured by a binary variable indicating if a women regularly practises BSE 
and another binary variable indicating if BSE is practised in the proper way (provided it is 
practised). Pre-treatment values of these response variables are also available. 

In order to study the behavior of the conditional logistic estimator in experimental studies of 
the type described above, we introduce a causal model which includes parameters for the control 
and treatment effects in the subpopulation of compilers and never-takers and assumes that only 
the subjects assigned to the treatment can access it. Given the type of experimental studies, 
we assume that always-takers do not exist; we also assume that defiers are no present. The 
model is then related to the causal models described by [7] and [8]; see also [9]. It also allows 
for the inclusion of base-line observable and unobservable covariates which affect the response 
variables at the first and second occasions. We show that, under this model, the conditional 
logistic method allows us to identify and consistently estimate the effect of the treatment on 
compilers and that of the control on never-takers. However, apart from very particular cases, 
this method does not allow us to identify the effect of control on the subpopulation of compilers 
and then the causal effect of the treatment over control on this subpopulation. As in other 



approaches for causal inference, this effect is here measured on the logit scale; see [10], [11], [12] 
and [13]. 

Based on an approximation of the distribution of the observable variables under the causal 
model, we then propose a correction for the conditional logistic estimator which allows us to 
remove most of its bias in estimating the effect of the treatment on compliers. It results a two- 
step estimator which has some connection with the estimator usually adopted for the selection 
model [14] and that proposed by [15] to estimate the causal effect of a treatment in a context 
similar to the present one. At the first step, the parameters of a model for the probability 
that a subject is a compiler are estimated. At the second step, a conditional logistic likehhood 
is maximized which is based on an approximated version of the conditional probability of the 
response variables at the two occasions, given their sum. This likelihood is computed on the 
basis of the first step parameter estimates. The proposed estimator is very simple to use and 
is consistent when the control has the same effect on compliers and never-takers. This result 
holds regardless of the model that we choose for the probability to comply. In the general case 
in which compliers and never-takers react differently to control, the estimator is not consistent 
but we show that it may converge in probability to a value surprisingly close to the true value 
of the causal parameters as the sample size grows to infinity. We also derive a sandwich formula 
for its standard error. As we show, with minor adjustments the two-step estimator leads to 
valid inference even with missing responses. 

The paper is organized as follows. In Section 2 we introduce the causal model for repeated 
outcomes coming from two-arm experimental studies. The behavior of the conditional logistic 
estimator under this model is studied in Section 3. The correction of this estimator is proposed 
in Section 4 where we also study the asymptotic and finite-sample properties of the resulting 
two-step estimator. In Section 5 we outline the extension of the approach to missing responses 
and in Section 6 we provide an illustration based on an application to the dataset deriving from 
the BSE study described above. Final conclusions are reported in Section 7 where possible 
extensions are also mentioned, such us that to experimental studies in which subjects in both 
arms can access the treatment and then non-compliance phenomena can be observed for all 
subjects. 



2 The causal model 



Let Yi and Y2 denote the binary response variables of interest, let V be a vector of observable 
covariates, let Z he a. binary variable equal to 1 when a subject is assigned to the treatment and 
to when he/she is assigned to the control and let X be the corresponding binary variable for 
the treatment actually received. We recall that V and Yi are pre-treatment variables, whereas 
Y2 is a post-treatment variable. Non-compliance of the subjects involved in the experimental 
study implies that X may differ from Z. In particular we consider experiments in which only 
subjects randomized to the treatment can access it and therefore Z — implies X — 0, whereas 
with Z — 1 we may observe either X = or X = 1. Using a terminology taken from [7], in this 
case we have only randomized eligibility and we then consider two subpopulations: compilers 
and never-takers. Nevertheless, the approach can be extended to randomized experiments in 
which subjects in both arms can access the treatment and therefore any configuration of {Z, X) 
may be observed; see Section 7 for a discussion on this point. In both types of experiment, we 
assume that defiers are not present and our aim is that of estimating the causal effect of the 
treatment over control in the subpopulation of compilers. 

We assume that the behaviour of a subject depends on the observable covariates V, a latent 
variable U representing the effect of unobservable covariates on both response variables and a 
latent variable C representing the attitude to comply with the assigned treatment. The last 
one, in particular, is a discrete variable with two levels: for never-takers, 1 for compilers. 

The model is based on the following assumptions: 

Al: CJL Yi|(C/, V), i.e. C is conditionally independent of Yi given ([/, V); 
A2: ZJL{U,Y^,C)\V; 

A3: XJL{U, V, Yi)\{C, Z) and, with probability 1, X = Z when C = 1 (compilers) and X = 

when C = (never-takers); 

A4: y2X(yi,z)|(c/,y,c,x); 

A5: for any v, c and x, we have 

logit[p(y2 = V, c, x)] — logit[p(Yi = l\u, v)] = a{v, x fcx + c(l — x)b{v)' (5, (1) 

where a{v,x) and b{v) are known functions which depend on observable covariates and 
(hmited to the first) on the received treatment and a and (3 are corresponding vectors of 
parameters. 



The above assumptions lead to a dependence structure on the observable and unobservable 
variables which is represented by the DAG in Figure 1. 
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Figure 1: DAG for the model based on assumptions A1-A5. U and V represent unobservable 
and observable covariates affecting the response variables Yi and Y2, C is a binary variable for 
the compliance status and Z and X are binary variables for the assigned and received treatment. 

Assumption Al says that the tendency to comply only depends on {U, V). Assumption A2 is 
satisfied in randomized experiments, even when randomization is conditioned on the observable 
covariates. This assumption could be relaxed by requiring that Z is conditionally independent of 
U given {V, Yi), so that randomization can also be conditioned on the first outcome. Assumption 
A3 is rather obvious considering that C represents the tendency of a subject to comply with 
the assigned treatment. Assumption A4 implies that there is no direct effect of Yi on Y2, since 
the distribution of the latter only depends on {U,V,C,X). Using a terminology which is well 
known in the literature on latent variable models, this is an assumption of local independence. 
Assumption A4 also implies an assumption known in the causal inference literature as exclusion 
restriction, according to which Z affects Y2 only through X. Finally, assumption A5 says that 
the distribution of Y2 depends on the vectors of parameters ex. and /3 through the functions 
a{v, x) and b{v). Note, in particular, that a{v, 0)'q: is the effect of the control on never-takers, 
a{v, 1)'q; is the effect of the treatment on compilers, whereas b{v)'f3 is the differential effect of 
the control between compilers and never-takers. In the simplest case, we have 



so that CK = {ai,a2)' and (3 have an obvious interpretation as effects of the control and the 
treatment on specific subpopulations. 

As mentioned above, the most interest quantity to estimate is the causal effect of the treat- 
ment over the control in the subpopulation of compilers. In the present approach, this effect is 



a{v,x) = {1 — x,x)' and b{v) = 1, 



(2) 



defined as the difference in logits (log-odds ratio) 

S{v) = logit\p{Y2 = l\u,v,C = 1,X = l)]-logit\p{Y2 = l\u,v,C = 1,X = 0)]^ 

= [a{v,l)-a{v,0)]'cx-b{vy(3, (3) 

and, witli reference to tlie subpopulation of compliers, it corresponds to tlie increase of the 
logit of the probabihty of success when X goes from to 1, all the other factors remaining 
unchanged. This quantity depends on the covariates in V and then an overall causal effect can 
be computed as the average of 6{v) over suitable configurations of these covariates. Under (2), 
we simply have 6 = a2 — ai — P and then computing this average is not necessary. Also note 
that the model makes sense, not only when Yi is a response variable of the same nature of Y2 
that is observed before the treatment, but also when Yi is a variable which is affected neither 
by the compliance status nor by the treatment received and such that the difference between 
the logits in (1) is independent of u and v. 

That based on assumptions A1-A5 is a causal model in the sense of [16] since all the observ- 
able and unobservable factors affecting the response variables of interest are included. Indeed, 
the same model could be formulated by exploiting potential outcomes, which we denote by 
Y2^'^\ z,x — 0,1. In this case, the model could be formulated on the basis of assumptions 
A1-A3 and the following assumptions which substitute A4-A5: 

A4*: Y^''"'^ = Y^''^ for z,x^O,l (exclusion restriction) and {Yi'^\Y^^^) X (Yi, Z, X)\{U, V, C); 
A5*: for any u, v, c and x, we have 

logit\p{Y^''^ = l\u, V, c)] - logit[p(Fi = v)] = a{v, x)'a + c(l - x)b{v)'(3- (4) 

A6*: Y2 = Y!f^ for any given value of t/, V, C and any given value x oi X. 

It may be easily realized that the model based on assumptions A1-A5 is equivalent to that based 
on assumptions A1-A3 and A4*-A6*. In a similar context, this kind of equivalence between 
causal models is dealt with by [10]. 

It is even more clear from (4) that ex. and /3 are vectors of causal parameters, in the sense that 
they allow us to measure the causal effect of the treatment over control in the subpopulation 
of compliers. Using potential outcomes and for simplicity under (2), this effect may now be 
expressed as 

5 = \og\\.\p{j!p = \\u,v,C = 1)] - logit[p(F2^°^ = v, C = 1)] = 0:2 - ai - /3. 



3 Behavior of the conditional logistic estimator 



In this section we show that, by the conditional logistic method and under mild conditions, 
we can identify and consistently estimate the parameter vector a which measures the effect 
of the treatment on compilers and that of the control on never-takers. However, the same is 
not possible for /3. In order to show this we need to derive the conditional distribution of 
(Yi,Z, X, given V under the assumptions introduced in Section 2. The probability mass 
function of this distribution is denoted by z,x,y2\v). 

First of all, these assumptions imply that the probability function of the conditional distri- 
bution of (Yi, Z, X, Y2) given ([/, V, C) is equal to 

P{yi: X, y2\u, V, c) = p{yi\u, v)p{z\v)p{x\c, z)p{y2\u, v, c, x). 

After some algebra we have 

z, X, y2\u, v) = -^—-^^^p{z\v) 5Jp(a;|c, ^) ^ ^ ^;,(„,^)+t(c,t;,.) ^(^l^' 

c 

where X{u,v) = logit[p(Yi = n{c\u,v) = p{c\u,v) and the sum is extended to 

c — 0,1. Moreover, t{c, v, x) — a{v, x)' ot + c{l — x)b{vy f3; see equation (1). Finally, denoting by 
(f){u\v) the density function of the conditional distribution of U given V and letting y+ — yi+y2, 
we have 

I + ^xiu,v) Z]^^(^l^' ^ \ + e^(u,v)+t(c,v,x) ''(''\''^ v)(l>{u\v)du. (5) 

c 

When z — 1 (treatment arm), p{x\c,z) is equal to 1 for x = c and to otherwise. Conse- 
quently, (5) reduces to 

^Xiu,v) I ^ gAKt;)+a(t>,.)'« ^(^l^' v)(l>(u\v)du, (6) 

since it is possible to identify a subject in this arm as a never-taker or a compiler according to 
whether x = or x = 1 and t{c,v,x) — a{v,xyot when c — x. Now let f{y2\v,z,x) denote 
the probability mass function of the conditional distribution of Y2 given {V,X,Y+ = 1), with 
= Yi + Y2. The above result implies that 

I . f{l-y2,l,x,y2\v) _ e^^"(^--)'" 

/ miv, i, X) ^^^^ ^1^^ ^ ^^^^ ^1^^ ^ ^ ga(t;,x)'a " > 

This probability function depends neither on the distribution of U nor on the function X{u, v). 
Consequently, as already mentioned above, we can identify the parameters in a measuring the 



effect of the treatment on compilers and that of the control on never-takers and the conditional 
logistic estimator of these parameters Is consistent. It Is also worth to observe that under 
assumption (2), which Implies that at — (q;i,q;2)', the conditional logistic estimator of these 
parameters has an explicit form given by 

1 ^0101 J ^ 1 '^Olll 

CKi = log and a2 — log , 

'^llOO ^1110 

where ny^zxy2 denotes the number of subjects In the sample with response configuration (j/i, 1/2) 
who are in the control or treatment arm (according to whether z = Q ox z = 1) and chose or 
not the treatment (according to whether x — Q or x — 1). 

When z — {) (control arm), p{x\c, z) is equal to 1 for a; = and to otherwise (regardless of 
c), since no subject in this arm can access the treatment. Then, expression (5) reduces to 

/(yi,0,0,|/2|'y) = p{Z = {)\v)g{yi,y2\v), 

r ^y+\(u,v) g?/2t(c,t>,0) 

q(Vi,V2\v) — / > T-r^-^r-Tr^:77^'K(c\u,v)(l)(u\v)du, 

avf-L'f^l / / -j^ _j_ gA(w,D) Z_-^ ]^ _|_ gA(u,D)+t(c,V,0) I ' ' ' ' 

c 

which is more complex than (6), being based on a mixture between the conditional distribution 
of Y2 for the subpopulatlon of compilers and for that of never-takers. In this case, we cannot 
remove the dependence on the distribution of U and on A(m, v) via conditioning on y+. Then, we 
cannot identify and consistently estimate the parameters in (3 corresponding to the differential 
effect of control on comphers with respect to never-takers. The same happens for the causal 
effect 5{v) defined in (3). 

In order to better investigate this point we consider an approximation of log g{yii y2\v) based 
on a first-order Taylor series expansion around /3 = 0, with denoting a column vector of zeros 
of suitable dimension. Note that this point corresponds to the situation in which the control 
has the same effect on compilers and never-takers. We have that 

^ogg{yi,y2\v) ^ \oggoiyi,y2\v) + h{yi,y2\v)b{v)'/3, 

where go{yi,y2\v) is equal to g{yi,y2\v) computed at /3 = 0, that is 

and 

h(yi,y2\v) — — ; — / TT — r- TT — 77-7- — ^;rr-n(l\u,v)(h(u\v)du. 

Now, because 5'o(0, l\v) = gQ(l,0\v)e*^^''"'^^ and recalling that t{0,v,0) = a{v,Oycx, after some 
algebra (see Appendix Al for details) we find 

log Q gj^j = log^(0, l\v) - \ogg(l, 0\v) ^ a(v, 0)'a + h(v)b(vy^, (8) 



where h{v) is a correction factor defined as 

hiv) = — -. / :rr-;-^7T7K:r-prTTT{l\u,v)d)iu\v)du. (9) 

^ ' t/o(l,0|v)7 1 + e^("'^) 1 + e^("'^)+*(0'^'0) ^ ^ ' ^ > ^ > 

This correction term is simply equal to 7r(l|i;) = p{C = l\v) when C is conditionally independent 
of U given V . We then have 

/lZ/2F,u,u; /(o,o,0,l|v)+/(l,0,0,0|v) ~ i + e« ^ ^ 

which shows that a conditional logistic estimator based on regressing 12 on V, only for the cases 
in which — 1 and Z — 0, would estimate a quantity which does not correspond to the effect 
of the control either on compilers or on never-takers. 

To clarify the above point consider the case of absence of covariates in which assumption (2) 
holds. In this case, the conditional logistic estimator is equal to log(noooi/''^iooo) which converges 
in probability to a quantity close to cti + hp. Then, provided that h can be suitably estimated, a 
correction for this estimator can be implemented so as to reduce its bias. This idea is exploited 
to propose a general approach which allows us to considerably reduce the bias of the conditional 
logistic estimator applied to the data coming from the control group. 



4 Corrected conditional logistic estimator 

With reference to a sample of n subjects included in the two-arm experimental study, let ya 
denote the observed value of Yi for subject i, let denote the value of Y2 for the same subject 
and let Vi, Zi and Xi denote the corresponding values oi V, Z and X, respectively. 

4.1 The estimator 

In order to estimate the parameters of the causal model introduced in Section 2, we rely on a 
standard logistic regression applied to the data coming from the treatment arm (for which we 
can disentangle compilers from never-takers) and a logistic regression based on approximation 
(10) for the data coming from the control arm. Note that to exploit this approximation we need 
to estimate the correction term h{v) defined in (9). For sake of simplicity, we assume that C 
is conditionally independent on U given V, so that this correction term corresponds to 7r{l\v) 
and for the latter we assume the logit model 

logit[7r(l|'i;)] = m{v)''n, (11) 



where m{v) is a known function of the observed covariates. The imphcations of this assumption 
will be studied in the following (see Section 4.2.1). It results an estimator of the causal effect 
parameters whose main advantage is the simplicity of use. The estimator recalls the two-step 
estimator of the selection model [14] and the estimator proposed by [15] in the causal inference 
literature. 

The two steps on which the proposed estimation method is based are the following: 

1. Estimation of rj. Since the compliance status may be directly observed for those subjects 
assigned to the treatment arm, estimation of t] is based on the observed values Xi and Vi 
for every i such that Zi — 1. Taking into account that the distribution of Z is allowed to 
depend on V, we then proceed by maximizing the weighted log- likelihood 

^i(^) = E T^f'^' log7r(l|i;,) + (1 - Xi) log7r(0|t;0], 

with weights corresponding to the inverse probabilities l/p{zi\vi). 

2. Estimation of a. and /3. This is done by maximizing the conditional log- likelihood 

4(a,/3;r;) = 5^ci,£,2(a,/3;r;), (12) 

i 

where £i2{cx., /3; ij) corresponds to the logarithm of (7) when z,; = 1 and to that of (10) when 
Zi — 0, once the parameter vector rj has been substituted by its estimate r) obtained at the 
first step. Moreover, di is a dummy variable equal to 1 if yi+ = 1, with yj+ = yii + yi2, and 
to otherwise, so that subjects with response configuration (0, 0) or (1, 1) are excluded 
since the conditional probability of these configurations given their sum would not depend 
either on a or /3. 

Maximization of £i [r]) may be performed by a standard Newton- type algorithm; the data to 
be used are only those concerning the subjects in the treatment arm. The same algorithm may 
used to maximize i2{oc, (3; rj). In this case, the data to be used concern all subjects with sum of 
the response variables (before and after) equal to 1. Moreover, collecting the parameter vectors 
a and /3 in a unique vector — (a', /3')', the design matrix to be used has rows w{vi, Zi, Xi)', 
where 

/ a{vi,0) \ 

w{Vi,Zi,Xi) = [ 

which corresponds to (a(vj, Xj)', 0') when Zi — 1 and {a{vi,Oy ,TT{l\Vi)b{xiy) when Zi — 0. At 
the end, by substituting the subvectors a and ^ oid into (3) we obtain an estimate 5{v) of the 



causal effect of the treatment, with respect to control, for compilers with covariate configuration 
V. Under (2), this estimate reduces to S — a2 — ai — $. 

With small samples, which are not uncommon in certain experimental studies, it might 
happen that discordant response configurations, of type (0,1) or (1,0), are not observed for 
certain configurations of (Z, X). This would imply that the estimator of 6 cannot be computed. 
To overcome this problem, we follow a rule of thumb consisting of: {i) checking that both 
discordant response configurations are present for each observable configuration of {X,Z); (ii) 
adding to the dataset the response configurations which are missing. To the added response 
configurations we assign a vector of covariates Vi equal to the sample average and weight 0.5 in 
the conditional log-likelihood (12). The simulation study in Section 4.2.2 allows us to evaluate 
the impact of this correction on the inferential properties of the estimator. 

Concerning the estimation of the variance-covariance matrix of the estimators a and /3, 
consider that (f)', 6 )' correspond to the solution with respect to (77', 0)' of the equation s{r}, 6) — 
0, where 



dr] 



\ 36 / 

with £2{6;r}) = £2(0:, /3; ^7)- When logit model (11) is assumed, the first subvector of s{ri,6) is 
equal to 

whereas the second subvector is equal to 

di2(0) , r ^w{Vi,zi,xi)'e 



i 

Prom [17] and [18], the following sandwich estimator of the variance-covariance matrix of {f)\ 6 )' 
results 

±{fi,0)^H~'k{H')-\ (13) 

where H is the derivative of s(t7, 6) with respect to (77', d') and K is an estimate of the variance- 
covariance matrix of s{r}, 6), both computed at = r) and 6 = 6. Explicit expressions for these 
matrices are given in Appendix A2. We can then obtain an estimate of the variance-covariance 
matrix of 0, denoted by S(0) or alternatively by S(q;,j9), as a suitable block of the matrix 
T,{fi,0). We can also obtain the standard error for estimator of the causal efTect S{v). In 
particular, when (2) holds then the standard error for S is simply se{S) = ^ A'S(0)A, where 
A = (—1, 1, —1)' is a vector such that 6 = A'd. 



4.2 Properties of the two-step estimator 
4.2.1 Asymptotic properties 

Suppose that Vi, yn, Zi, Xi and yj2, with i — 1, . . . ,n, are independently drawn from the true 
model based on assumptions A1-A5 with parameters a = ao a-nd (3 — /Sq. This model must 

ensure that 

f{yi:Z,x,y2\v) > Q for all v,yi,z,x,y2. 

Provided that the functions a(v, x) and b{v) satisfy standard regularity conditions, which are 
necessary to ensure that the expected value of the second derivative of i2{ot, /3; ri)/n is of full 
rank, the theory on maximum likelihood estimation of misspecified models of [18], see also [19], 
implies that the two-step estimators a. and f3 satisfy the following asymptotic properties as 
n — > oo: 

• consistency: a ^ a* and ^ ^ l3^, with a* and being the pseudo-true parameter 
vectors which are equal, respectively, to true parameter vectors cxq and /Sq when /3q = 0; 

• asymptotic normality: ^/ri{a',^y —>■ N[0,Q{oc^, f3^)], with 0(a*,/3^) being the limit in 
probability of the matrix E(Q:,/9)/n; see definition (13). 

We recall that ^ means convergence in probability, whereas means converge in distribution. 
Moreover, in order to give a formal definition of a* and f3^ we have to consider that these 
correspond to the supremum of Eo[i2{ot, ^■,r]^)/n\, where Eq denotes expectation under the 
true model and ry^ denotes the limit in probability of the estimator t) computed at the first 
step. Clearly, since the log-likelihood £2(0:, /3; r/*) is based on an approximation of the true 
model around /3 = 0, we have that a* = oto and = /3o when — 0. This implies that in 
this case 5(v) ^ So(v), with 5o(v) being equal to the true causal effect of the treatment over 
control for a compiler with covariates v. Obviously, this is not ensured when /3q = 0, but we 
expect a^, and (3^ to be reasonably close to, respectively, ao and /3q when /3q is not too far from 
0. The same may be said about the estimator 6{v) of S{v). 

In order to illustrate the previous point, we considered a true model which involves only one 
observable covariate V and under which the joint distribution of (C/, V) is 

"[(:)■(; ;) 



(14) 



with p — 0.00, 0.75. Moreover, we assumed (2) and that Yi, C, Z, Y2 have BernouUi distribution 
with probabihties of success chosen, respectively, as follows 

p{Yi = l\u,v) = expit[(-u + f )/a/1 + - 1], 
7i{c\u,v) = expit[{u + v)/y/l + p^/2], 

(15) 

p{Z = ll^) = expit(— v), 
p(Y2 — l\u,v,c,x) — e-xpit[{u + v)/^/Yl^ — 1 + (1 — x)ai + xa2 + c(l — x)(3], 

where expit(-) is the inverse function of logit(-), (3 = —1.00, —0.75, . . . , 1.00 and ai is defined so 
that the casual effect S — a2 — cki — (3 is equal to or 1, with a2 — 1 when S — and a2 = 2 
when 5 — 1. Under this model, we computed the limit in probability 5* of each of the following 
estimators: 

• ^nuii' two-step conditional logistic estimator of 5 in which the probability to comply is 
assumed to do not depend on the covariate; this is equivalent to letting m(i') = 1 in (11); 

• 5cov' as above with m{v) — {Ijv)', so that the covariate is also used to predict the 
probability to comply; 

• Sitt- intention to treat (ITT) estimator based on the conditional logistic regression of Y2 
on (1, Z) given y+ = 1; 

• 5tr- treatment received (TR) estimator based on the conditional logistic regression of Y2 
on (1,^) given 1+ = 1. 

The limit in probability of these estimators is represented, with respect to the true value of (3, 
in Figure 2. 

It may be observed that, when the true value of /3 is 0, the limit 5* is equal to true value 
of 5 for both estimators 5nuii and 5cov This is in agreement with our conclusion above about 
the asymptotic behavior of the proposed estimator. When the true value of j3 is different from 
0, instead, this does not happen but, at least for Scov, the distance of 5* from the true d is 
surprisingly small and does not seem to be affected by the correlation between U and V which 
is measured by p. We recall that, although our estimator is derived under the assumption that C 
is conditionally independent of U given V, this result is obtained under a model which assumes 
that both U and V have a direct effect on C. 

A final points concerns the ITT and TR estimators. The first is adequate only if the true 
value of S is equal to (plots on the left of Figure 1), whereas it is completely inadequate when 
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Figure 2: Limit of the estimators 6nuU (dashed line), 6cov (solid line), Sat (dotted line) and 6tr 
(dash-dotted line) under assumptions (I4) and (15), with p = 0.00,0.75, /3 between -1 and 1 
and S — 0,1, with ai and a2 defined accordingly. 

it is equal to 1 (plots on the right). The TR estimator, instead, is consistent only when the true 
value of (3 is equal to 0, but in the other cases it has a strong bias. Overall, even if based on a 
logistic regression method, these two estimators behave much worse than our estimator. 



4.2.2 Finite-sample properties 

In order to assess the finite-sample properties of the two-step estimator, we performed a simu- 
lation study based on 1000 samples of size n = 200, 500 generated from the model based on as- 
sumptions (14) and (15). For each simulated sample, we computed the estimators a. — (di, 0(2)', 
P and 5 based of a model for the probability to comply of type (11), with m{v) — (1, v)'. Using 
the notation of Section 4.2.1, these estimators could also be denoted by otcov, $cov ^-nd 5cov The 
results, in term of bias and standard deviation of the estimators and in terms of mean of the 
standard errors, are reported in Table 1 (when the true value of 6 is 0) and in Table 2 (when 
the true value of 5 is 1). Note that, for small sample sizes as those we are considering here, it 
may happen that there are not discordant configurations. Consequently, we apply the rule of 
thumb described in Section 4.1 to prevent instabilities of the estimator. 
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Table 1: Simulation results for the proposed two-step estimator based on 1000 samples of size 
n — 500, 1000 generated under different models based on assumptions (14) and (15), with p — 
0.00, 0.75 and different values of ai and (5 (in each case a2 — 1 and 5 — 0). 



The simulations show that the estimators always have a very low bias which, as may ex- 
pected, tends to be smaller when the true value of f3 is equal to and when n = 500 instead 
of n = 200. It is also worth noting that this bias is not considerably affected by p. These con- 
clusions are in agreement with those regarding the asymptotic behavior of the estimator drawn 
on the basis of Figure 2. Consequently, the rule of thumb that we use when all the possible 
discordant configurations are not present seems to work properly. 
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Table 2: Simulation results for the proposed two-step estimator based on 1000 samples of size 
n — 500, 1000 generated under different models based on assumptions (14) and (15), with p — 
0.00, 0.75 and different values of ai and (5 (in each case a2 — 2 and 5 — 1). 

For what concerns the variabihty of the estimators, we observe that the standard error of each 
of them is roughly proportional to 1/ ^/n. This property is also in agreement with the asymptotic 
results illustrated in Section 4.2.1. Finally, for each estimator, the average standard error is close 
enough to its standard deviation. Relevant differences are only observed for n — 200 when the 
standard error occasionally tends to be larger than the standard deviation. This confirms the 
validity of the proposed estimator of the variance-covariance matrix of d which is based on the 
sandwich formula given in (13). 



5 Dealing with missing responses 



We now illustrate how the proposed causal model and the two-step conditional logistic estimator 
may be extended to the case of missing responses. For this aim, we introduce the binary indicator 
Rh, h — 1,2, equal to 1 if the response variable is observable and to otherwise. 

5.1 Causal model 

With missing responses, we extend the model introduced in Section 2 by assuming that: 
Bl: RiJLY,\{U,Vy, 
B2: CJL{R,,Y,)\{U,Vy, 
B3: Z JL{U,Yi,Ri,C)\V; 

B4: X JL{U,V,Yi,Ri)\{C,Z) and, with probability 1, X ^ Z when C = 1 (compilers) and 
X — when C — (never-takers) ; 

B5: Y2JL{Y^,R,,Z)\{U,V,C,X); 

B6: R2JL{Y,,R^,Z,Y2)\{U,V,C,Xy, 

B7: for any u, v, c and x, we have 

logit[p(y2 = V, c, x)] — logit[p(yi = l\u, v)] — a(v, x)'a + c(l — x)b{vyf3, 

with the functions a{v, x) and b{v) and the corresponding parameter vectors defined as 
in Section 2.1. 

New assumptions are essentially Bl and B6 concerning the conditional independence between 
Ri and Yi and that between R2 and Y2 given observable and unobservable covariates. This 
assumption is weaker than the assumption that responses are missing at random given the 
observable covariates [20]. 

The resulting model is represented by the DAG in Figure 3. 

5.2 Two-step conditional logistic estimator 

Under assumptions B1-B7, we have that 

p{yi, ri, z, X, 1/2, r2\u, v, c) = p{yi\u, v)p{ri\u, v)p{z\v)p{x\c, z)p{y2\u, v, c, x)p{r2\u, v, c, x). 



Figure 3: DAG for the model based on assumptions B1-B7. U and V represent unobservable 
and observable covariates affecting the response variables Yi and Y2 and the indicator variables 
Ri and R2 for the response variables being observable, C is a binary variable for the compliance 
status and Z and X are binary variables for the assigned and the received treatment. 

Then, marginalizing with respect to C and f/, we find that the probabihty mass function of the 
conditional distribution of (Yi, Z, X, 12,-^2) given V is equal to 

^y+\{u,V) 



f*{yi,ri,z,x,y2,r2\v) = p{z\v) J ^ ^ ^^^^,^^ p{ri\u,v) x 



X 

c 



where, as in Section 3, t(c, v, x) = a{v, x)'a + c(l — x)b{y)' [3. 
When z = 1, the above expression simplifies to 

f*{yi,ri,l,x,y2,r2\v) = p{Z = l\v) x 

TT — r»(ri |m, v) — ^ — 7 — ^v{'r2\u, V, c, x)(j)(u\v)du, 

and this implies that 

f*{y2\v,l,x) 

where, in general, f*{y2\v, z,x) = p{y2\v,Ri = l,z,x,R2 = 1,Y^ = 1). This is due to the 
definition of p{x\c, z) which, as already noted, may only be equal to or 1. The same does not 
happen when z = Q since in this case 





v) 


^y2a{v,x)'a. 


/*(0,l,l,x,l,l 


i;) + /*(!, 1,1, a;, 0,1 i + e^C"'^)'" 



/*(?/!, n,0,0,?/2,r2|i^) = p{Z = Q\v)g*{yi,ri,y2,r2\bv) 



gj/2t(c,f ,0) 



Then, as in Section 4, we consider a first-order Taylor series expansion of \ogg*{yi, ri, 1/2, r2\v) 
around j3 — and we find that 

log 5* (z/i, '^1,^2,^2 1 ~ loggQ{yi,ri,y2,r2\v) + h*{yi,ri,y2,r2\v)h{v)' ^ 

where fifod/i, ri, 1/2, r2|v) is the function ri, 1/2, ''"2|f ) computed at /3 = and 
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Consequently, we have 

'°« /-lu:o:o:o:l!^i ""'--°''"+'''W''W'^. 

where is a correction factor which is simply equal to 7r(l|v) — p{C — l\v) when C is 

conditionally independent of U given V. Finally, we have 

pa{v,0Ya+h{v)b{vYl3 

ny2\v,0,0) 



I _|_ Qa{v,oya+h{v)b{vy/3 ' 

which is exactly the same expression given in (10). 

On the basis of the above arguments, in the case of missing responses wc propose to use a 
two-step estimator which has the same structure as that described in Section 4.1 and is based 
on a logit model of type (11) for the probability 7r(l| v) of being a compiler given the covariates. 
Here, for each subject i, with i = 1, . . . , n, we observe Vi, rn, Zi, xi and ri2, where, for h — 1,2, 
Thi is the observed value of Rh- For the same subject we also observe yn if rn — 1 and yi2 if 
ri2 = 1- 

The estimator is based on the following steps which, as for the initial estimator, may be 
performed on the basis of standard estimation algorithms: 

1. Estimation of r). This is based on the observed values Vi and Xi for every i such that 
Zi — 1 and proceeds by maximizing the weighted log-likelihood 

^t(^) = E ^^[xr\ogTi{l\Vi) + (1 - Xi) log7r(0|t;,)]. 

2. Estimation of ol and 13. This is performed by maximizing the conditional log-likelihood 

e2{oc,(3;'n) = '^dirari2£i2{cx,(3;r]), 

i 

where ii2{cx, /3;r)) is defined as in (12). 



Note that the only difference with respect to the estimator in Section 4.1 is in the second step 
where we consider only the subjects who respond at both occasions, whereas at the first step 
we consider all subjects in order to estimate the model for the probability of being a compiler. 
Moreover, standard errors for the estimator can again be computed on the basis of the sandwich 
formula (13). 

5.3 Properties of the two-step estimator 

A final point concerns the properties of the estimators a, ^ and 5{v) with missing responses. 
These estimators have the same asymptotic properties they have when the response variables 
are always observable (see Section 4.2.1). The main result is that these estimators are consistent 
when /3q = 0, regardless of the parametrization used in the logit model (11) for the probabihty 
to comply. When (3^ ^ 0, the estimators a. and f3 converge to a* and /3^, respectively, as 
n ^ oo. These limits are equal to the true values olq and /3q when /3q = and are expected to 
be reasonably close to these true values when /3q is not too far from 0. The same may be said 
for the estimator 5{v) which converges to 5^{v). 

To illustrate the above point, in Figure 3 we report some plots of 5^ with respect to (3 
under a true model involving only one observable covariate and based on the same assumptions 
illustrated in Section 4.2.1, see in particular (14) and (15), beyond the assumption that Ri and 
R2 have Bernoulli distribution with probabilities of success chosen, respectively, as follows 

p{Ri^l\u,v) = expit[l + (M + w)/v/rTp/2], 

, (16) 

p{R2 = l\u,v,c,x) = expit[l + {u + v)/^/TTp/2 + c/2 + x/2\. 

The estimators we considered are: 

• ^nuii- two-step conditional logistic estimator of 6 based on a model for the probability to 
comply of type (11) with m{v) — 1; 

• Scov- as above with m{v) = (1,^)', so that the covariate is also used to predict the 
probability to comply; 

• ditt- ITT estimator based on the conditional logistic regression on only the subjects who 
respond at both occasions, i.e. we regress I2 on (1, Z) given y+ = 1, i?i = 1 and R2 = 1; 

• 6i,r- TR estimator based the conditional logistic regression of Y2 on {1,X) given = 1, 
i?i = 1 and R2 = 1. 



The resulting plots closely resemble those in Figure 2 and then similar conclusions may be drawn 
about the proposed estimator. In particular, we again note the small distance between the limit 
in probability of the estimator and the true value of the parameter. 
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Figure 4: Limit of the estimators Snuii (dashed line), Scov (solid line), Sat (dotted line) and Str 
(dash-dotted line) under assumptions (14), (15) and (16), with p = 0.00,0.75, /3 between -1 and 
1 and S — 0,1, with ai and a2 defined accordingly. 



Under the same true model assumed above, we studied by simulation the finite-sample 
properties of the estimators d, ^ and 5. As usual, we focused on the estimators which exploit 
the covariate to predict the probability to comply, and then we let m{v) = (1, v)' in (11). Under 
the same setting of the simulations in Section 4.2.2, we obtained the results reported in Tables 
3 and 4 when (16) is assumed. These results are very similar to those reported in Tables 1 and 
2 for the case in which the response variables are always observed. The main difference is in 
the variability of the estimators which is obviously larger because of the presence of missing 
responses. 
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Table 3: Simulation results for the proposed two-step estimator based on 1000 samples of size 
n — 500, 1000 generated under different models based on assumptions (14) and (15) and (16), 
with p — 0.00, 0.75 and different values of ai and (5 (in each case a2 — 1 and 5 — 0). 

6 An application 

To illustrate the approach proposed in this paper, we analyzed the dataset coming from the 
randomized experiment on BSE already mentioned in Section 1. 

The study took place between the beginning of 1988 and the end of 1990 at the Oncologic 
Center of the Faenza District, Italy. The sample used in the study consists of 657 women 
aged 20 to 64 years, who were randomly assigned to the control, consisting of learning how to 
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Table 4: Simulation results for the proposed two-step estimator based on 1000 samples of size 
n — 500, 1000 generated under different models based on assumptions (14) and (15) and (16), 
with p — 0.00, 0.75 and different values of ai and (5 (in each case a2 — 2 and 5 — 1). 



perform BSE through a standard method, or to the treatment, consisting of a training course 
held by a specialized medical staff. Only women assigned to the treatment can access it and then 
non-compliance may be observed only among these subjects. In particular, of the 330 women 
randomly assigned to the treatment, 182 attended the course and so they may be considered as 
compilers; the remaining women may be considered as never-takes. 



The efficacy of tlie treatment is measured by two binary response variables, observed before 
and after tfie treatment /control, which indicate if BSE is regularly practised and if the quality of 
BSE practise is adequate. Several covariates are also available, such as age, number of children, 
educational level, occupational status, presence of previous cancer pathologies in the woman or 
her family, menopause and adequate knowledge of breast pathophysiology. Finally, some response 
variables are not observed and these have to be treated as missing. 

The dataset has already been analyzed by [5] , on the basis of a standard conditional logistic 
approach, and by [6] , who exploited a potential outcome approach allowing for missing responses, 
which is related to that of [21]. 

In analyzing the dataset, we first considered the effect of the treatment on practicing BSE. 
In this case, Yi is equal to 1 if a woman regularly practises BSE before the treatment and to 
otherwise. Similarly, Y2 is equal to 1 if a woman regularly practises BSE after the treatment 
and to otherwise. The first variable was observed for the 93.61% of the sample and the second 
for the 65.30%. Wc then followed the method for missing responses described in Section 5 under 
assumption (2) for the parametrization of the causal model. In particular, we first computed the 
estimators Cinuii, Pnuii and Snuii, based on predicting the probability to comply only on the basis 
of the indicator variable for the second response variable being observable, and the estimators 
occov, $cov and 5cov, which also consider the covariates age and age-squared in the model used 
to predict this probability. These covariates arc included since are among those with the most 
significant effect on the probability to comply. We also considered the ITT estimator dm and 
the TR estimator Str defined as in Section 5.3. The results are displayed in Table 5. 

Our first conclusion is that the inclusion of the covariates in the model for the probability 
to comply does not dramatically affect the estimates of the parameters and of the causal effect 
computed following our approach. In particular, the estimate of «2 remains unchanged by 
the inclusion of these covariates, since this estimate exploits only the data deriving from the 
treatment arm. Overall, we can observe an effect of the control on never-takers, corresponding 
to ai, which is not significant. Moreover, the estimate of the parameter /3 is very different from 
zero, indicating a great difference between compilers and never-takers for what concerns this 
effect. Then, we conclude that the effect of the treatment over control on practicing BSE (6) is 
not significant. A similar conclusion is reached on the basis of the ITT estimator of S, whereas 
the TR estimator attains a value much higher of that of the other estimators, since it does not 
distinguish between compilers and never-takers for what concerns the effect of the treatment. 
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Table 5: Estimates of the causal parameters obtained on the basis of the proposed approach and 
the ITT and TR approaches when the response variable is equal to 1 for a woman regularly 
practicing BSE and to otherwise. 

We then considered the effect of the treatment on the quahty of the BSE practise. As in 
[?], this is measured through the binary response variables Yi and Y2 (here redefined) which are 
equal to 1 if the score assigned by the medical staff to the quality of the BSE practise is greater 
than the sample median and to otherwise. As usual, Yi is a pre-treatment variable and Y2 
is a post-treatment variable. Obviously, these variables are observable only if BSE is practised 
and so we again used the method for missing responses described in Section 5. In particular, Yi 
was observed for the 54.80% of the sample and Y2 for the 51.93%. The results obtained from 
the application of the same estimators mentioned above are reported in Table 6. 

In this case, the inclusion of the covariates age and age-squared in the model for predicting 
the probability to comply has a slight effect on the estimates of the parameters a, (3 and 
5 computed on the basis of the proposed approach. Never-takers and compilers now appear 
to be less distant in terms of reaction to the control, whose effect is not significant for both 
subpopulations. On the other hand, the effect of the treatment on compilers is significant as 
well as the causal effect of the treatment over control. The estimate of this causal effect is in 
this case close to the RT estimate, whereas the ITT estimate is much smaller, even if it remains 
significantly greater than 0. 

Overall, the results obtained with the proposed approach are in accordance with those of [5] 
and [6], who concluded that the training course has not a significant effect on practising BSE, 
but it has a significant effect on the quahty of the BSE practise. 
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Tabic 6: Estimates of the causal parameters obtained on the basis of the proposed approach and 
the ITT and TR approaches when the response variable is equal to 1 when the quality of the 
BSE practise is adequate and to otherwise. 

7 Discussion 

A causal model has been introduced to study the behavior of the conditional logistic esti- 
mator as a tool of analysis of data coming from two-arm experimental studies with possible 
non-compliance. The model is applicable with binary outcomes observed before and after the 
treatment (or control) . It is formulated on the basis of latent variables for the effect of unob- 
servable covariates at both occasions and to account for the difference between compilers and 
never-takers in terms of reaction to control and treatment. A correction for the bias of the 
conditional logistic estimator has also been proposed which can be exploited when we want to 
estimate the causal effect of the treatment over control in the subpopulation of compilers. It 
results a two-step estimator which has some connection with the estimators of [14] and [15] 
and represents an extension of the standard conditional logistic estimator for this type of ex- 
periments. This estimator may be simply computed through standard algorithms for logistic 
regression and does not require to formulate assumptions on the distribution of the latent vari- 
ables given the covariates. It also has interesting asymptotic and finite-sample properties which 
are maintained even with missing responses. 

One of the basic assumptions on which the approach relies is that a subject is assigned 
to the control arm or to the treatment arm with a probability depending only on the observ- 
able covariates and not on the pre-treatment response variable. Indeed, we could relax this 
assumption, but we would have more complex expressions for the conditional probability of 



the response variables given their sum. Similarly, the approach can be extended to the case in 
which subjects assigned to both arm can access the treatment and then non-compliance may 
also exist in the control arm, i.e. certain subjects assigned to the control may decide instead 
to take the treatment. The model presented in Section 2 can be easily extended to this case. 
Using the terminology of [7], we have to consider the subpopulations of compliers, never-takers 
and always-takers. By exploiting an approximation similar to that illustrated in Section 3, we 
can set up an estimator of the causal effect of the treatment also in this case. The causal effect 
is again referred to the subpopulation of compliers and is measured on the logit scale. The 
approach would be complicated by the fact that the true model involves a mixture on three 
subpopulations. Moreover, the effect of the control on ncvcr-takers and that of the treatment 
on compliers is not directly observable from the treatment arm as it was in the original model. 
However, the resulting estimator would maintain its simplicity as main advantage, being based 
on a series of logistic regressions with suitable design matrices, which can be performed by 
standard algorithms. 

As a final comment consider that, driven by the application on the BSE dataset, we only 
considered the case of repeated binary response variables. However, the approach may be easily 
extended to the case of response variables having a different nature (e.g. counting), provided 
that the conditional distribution of these variables belongs to the natural exponential family 

and the causal effect is measured on a scale defined according to the canonical link function for 
the adopted distribution [22]. 
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Appendix 



Al: Mathematical details on the approximation 

In order to derive (8) consider that 

log5(l,0|t;) -log5(0,l|t;) « \oggo{l,0\v) - \oggo{0,l\v) +b{vy/3[h{0,l\v) - h{l,0\v)]. 
Moreover, the first difference at right hand side is equal to a{v, 0)'q; whereas 

1 f qHu,V) gt(0,U,0) 

= ffo(l,0|t;)e*(°.^.°) J 1 + eM»,^) [1 + eKu,v)-,tio,v,o)^2 v)<l>{u\v)du 
and then h{0, l\v) — h{l,0\v) = h{v) as defined in (9). 



A2: Computation of E(77, ^) 

The derivative of s{r],9) has the following structure 
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Finally, we have 
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The estimate of the variance-covariance matrix of the score may be expressed as 
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